home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Meeting Pearls 4
/
Meeting Pearls Vol. IV (1996)(GTI - Schatztruhe)[!].iso
/
Pearls
/
dev
/
C-Lib
/
APurify
/
Doc
/
MOT-APurify.doc
< prev
next >
Wrap
Text File
|
1996-01-10
|
29KB
|
651 lines
MOT-APurify v1.3
----------------
MOTOROLA-syntax version.
(c) by Samuel DEVULDER
jan. 1996
Samuel.Devulder@info.unicaen.fr
DESCRIPTION (SHORT):
--------------------
This is APurify for compilers with MOTOROLA syntax asm-files. As far as
I know, all compilers exept GCC uses such a syntax. If you're using the
GCC compiler, then read MIT-APurify.doc instead. That version is rather
a version for the DICE compiler, but I think it can work with other
compilers. In the following of that document, APurify stands for
MOT-APurify, and I assume you're using the DICE compiler.
APurify is a program that allows you to detect bad accesses to memory
of your programs without any kind of specific external devices (MMU).
It avoids bugs due to accessing memory not owned by your program.
INSTALLATION:
------------
That archive contains the version of APurify for the DICE compiler
as well for other compilers. Here is a description of DICE-related
files of this archive for that version. It also gives you what to do
with those files to make an installation.
- doc/MOT-APurify.doc The file you are currently reading. Put it
with all your doc files. It is usefull from
time to time.
- doc/History The whole history. (this file is not very
usefull for common people). Do whatever you
want with it.
- bin/MOT-APurify The parser tuned for the MOTOROLA syntax.
Rename it as APurify and put it someware in
your path. That program can be used with
any compiler that outputs MOTOROLA syntax
(ie. all compilers except GCC).
- lib/APur-dcc.lib The DICE link-time library. Rename it as
APur.lib and put it someware in your
library search-path if you are using the
DICE compiler. It may work well for
other compilers (it is a COMMODORE-format
library). If that library is not good for
you (generating undefined labels or else),
then try APur-pdc.lib. If it fails too,
then please contact me and I'll try to
include a specific version of that library
for that compiler in a future release.
- lib/APur-pdc.dir The PDC link-time library. Rename it as
- lib/APur-pdc.lib APur.lib and put it someware in your
library search-path if you are using the
PDC compiler. It may work well for other
compilers (it is a COMMODORE-format
library). If that library is not good for
you (generating undefined labels or else),
please contact me and I'll try to include a
specific version of that library for that
compiler in a future release.
- test/test.c Source of a stupid test file. Just here to
let you remake the test program. Do
whatever you want with it.
- test/test.dcc Test file Apurify'ed. Run it to see how
APurify is useful :-). (dice generated
file)
- test/test.pdc Test file Apurify'ed. Run it to see how
APurify is useful :-). (pdc generated file)
SYNOPSIS:
--------
Usage: APurify [-revinfo] <inputfile> [options]
Where options can be:
-h To display this usage
-? To display this usage
-tb To test memory referenced through base register
-ts To test memory referenced through stack register
-tl To test memory referenced through local stack frame
-tp To test pea instructions
-sd To use BaseRegister-relative mode (small data)
-sc To use PC-relative mode (small code)
-o arg Specifies output file (def=%s)
-br arg Sets the base register (def=A4)
-mp arg Sets the main entry-point (def=_main)
Options can be anywhere on the command line. NOTE: They can nomore be
merged together, they must be separated by a space. You can pre-define
them with the environment variable AP_MOTP_OPT. For example, if you do:
CLI> SetEnv AP_MOTP_OPT "-tb -br A5"
Then, when "-tb -br A5" will automatically be added to the command
line. The space between an option and its argument can be ommited. Thus
"-br A4" is the same as "-brA4". Here is a description of arguments and
flags:
-revinfo This displays informations about APurify (name, size and
date of modules and number of compilation done for that
version).
-br arg This sets the base register used to reference memory
in SMALL_DATA model. Usually A4 is used for that perpose
and that's the default. If A5 is used instead then add
-brA5 on your command line.
-tb This enable APurify to check all referenced memory through
the base register (see -br). If you are using a SMALL_DATA
model, add this flag on your command line. By default,
APurify won't check memory referenced through the base
register.
NOTE: for safest check, you should always use that option,
even if you're not in smalldata model (A4 may be used as
a temporary register in that case). To allow this, you can
use the environment variable.
-ts This enable APurify to check memory referenced by stack
pointer (SP or A7). By default APurify won't check such
memory accesses (to reduce the code size and increase the
runtime speed). That option will detect when you have no
more room on your stack (stack overflow).
-tl This enable APurify to check memory referenced by local
stack pointer (the one that is link'ed and unlink'ed when
enterring and exiting a C-function). By default, this is
switch off. This option allow APurify to detect stack
overflow.
-tp This enable APurify to check indirect adresses pushed onto
the stack by using a pea. By default this is off. When
used, that option will check things like "pea a2@(10)" or
the like. This can help you with memory accessed by a
pointer in a code that has not been APurify'ed. For example
this is usefull for things like fread(&ptr[10],10,1,fp)
because in that case the "pea a2@(10)" used to push on the
stack &ptr[10] will be checked and if ptr[10] is not owned
by your program, you'll get an APurify error. Please note
that this may no work all the time since &ptr[0] can be
translated as "movel a0,sp@-" which won't be checked.
-sd This tells APurify to use the SMALL_DATA model of
adressing. It produces a shorter code. APurify will use
he base register (specified with -br or A4 by default) to
access it's data. That will only work if your code is
adressing less than 64Kbytes of static data.
-sc This tells APurify to use the SMALL_CODE model of
adressing. It produces a shorter code. APurify will use
PC-relative mode to access it subroutines. if you code is
less than 32Kbytes.
-o arg This specifies the name of the outputfile. If ommited the
outputfile will be the same as the inputfile (source file).
The name of the output file can be defined by a real name
or a pattern. A pattern is a string where special sequences
of characters (called specifier) are replaced by special
strings. Let's suppose that inputfile is equal to
drive:path/file.ext
Here is a description of specifiers:
%s will be replaced by the full source name:
drive:path/file.ext
%S will expand to the full source name without the
extension:
drive:path/file
%b stands for the full basename:
file.ext
%B is a shortcut for the full basename without the
extension:
file
%p is the path (ending "/" or ":" is included):
drive:path/
%e is the extension ("." is ommited):
ext
Thus, if you put "-o ram:%B-apurify.%e" in the commandline,
then the outputfile will be "ram:file-apurify.ext" with
our example.
-mp arg This tells APurify which label should be considered as the
entry-point. By default it is set to "_main", and it should
not be modified.
-?
-h
? Obvious options.
DESCRIPTION (A BIT LONGER):
--------------------------
As a general rule, at the microprocessor level, there is two kind
of ways to access memory. There is direct access and indirect access to
memory. For example, in C, direct access can be viewed as accessing to
global variables. Indirect access corresponds to accessing an array
value. More precisely, direct access corresponds to reading or writing
a variable whose address is known at compilation time (or since the
loading of the program into the memory). Indirect access is used for
variables whose adress is dynamicaly determined by the program. For
example, if p is a pointer to an array allocated by malloc(), *p is an
indirect access. Such an access occur also in case of instruction like
T[i] where T is a global array, because the address of T[i] is not
known at compilation time, since it depends on the index value i. Using
indirect access to memory is called indirection.
A regular program must not access memory not owned by it. That kind
of access can be qualified as illegal.
Illegal direct access to memory is not possible, because by
definition, only global variables can be accessed that way and those
variables belongs obviously to the program (except for code written in
assembly language that references absolute values, for example:
"btst #6,$bfe001"; but that kind of code is not a good programming
:-)). So we can assume that direct access to memory is always right.
On the other hand, it is sure that indirect access to memory can
be illegal. Many bugs are made by overstepping array boundaries. If
that oversteppings are in reading a value, there is not much trouble
for over running tasks (it is an error inside your task); but if it is
in writing you may directly interfere with other tasks and big mess can
happen (total breakdown of the system).
APurify works on that kind of access by verifying the validity of
indirect access to memory. It remebers the memory that was allocated by
the program and check the integrity of each access. One can think that
makes a lot of tests ! Well, yes, but APurify is not designed to be
used in the general use of programs; just in test phases. Moreover,
indirections do no occur very often actually. Only array-based
variables produces indirections. Thus, the variables on the stack
--although being accessed by indirection-- are not checked because
their access is always safe (at least if there is no stack overflow !).
Also, in SMALL_DATA model, global variables access is done through
indirection, but they are not checked.
If an illegal access is found, APurify displays an error message on
the error stream of the program. There is two kind of illegal accesses.
Some are accesses to memory that doesn't belong to the program (it is
called an access between blocks), some others are accesses to a part of
memory owned by a program and an other part not owned by it (it is an
overstepping of a block). You can see this visually: If [ 1 ] and [ 2 ]
represent two blocks allocated by the program and ( 3 ) the memory
accessed, then
---- [ 1 ] ---- ( 3 ) ---- [ 2 ] ---->
0 increasing address
corresponds to the first kind of illegal access and
---- [ 1 ( ] 3 ) ---- [ 2 ] ----->
or
---- [ 1 ] ---- ( 3 [ ) 2 ] ----->
corresonds to the second kind of access. The first kind is very common
but the second is quite rare (it's rather a misaligment problem).
APurify has two output modes. One is verbose an tries to give lot
of informations by using words. The other one is more brief and gives
you the same informations but you'll have to decode them.
When APurify starts and ends, it outputs the date/time. This is
useful if you are using logfiles. With that, you can keep all your logs
in a single file and retrieve any execution with it's date of
execution.
In case of an error, APurify displays some text. The first line
looks like this one:
**** APURIFY ERROR ! [$<N1>(<N2>) <ATTR> (<TEXT1>)] <TEXT2>:
That line represent the accessed memory. <N1> is the hexadecimal
address accessed. <N2> is the length of the access (in decimal). <ATTR>
represents the type of acess. <TEXT1> allows you to find where in your
code the illegal accessed had happened. <TEXT2> describe the kind of
illegal access.
If the length (<N1>) is 1, then it was a byte access. 2 stands for
a short access, 4 for a int/long and >4 for movem instruction.
Attributes, <ATTR>, can be "R--" or "-W-". The first one represents an
access in reading a value and the second an access in writing a value.
The text <TEXT1> look like this:
<NAME>, PC=$<PC#> HUNK=$<HUNK#> OFFSET=$<OFF#>
<NAME> is the name of the subroutine where the error occured. It is
always displayed (even if it is a "static" one). The rest of the line
can be partially displayed, showing as much informations as APurify can
get. <PC#> is a hexadecimal address pointing to the instruction that
produced the error. <HUNK#> and <OFF#> are the hunk number and the
relative offset of <PC#>. Using <HUNK#> and <OFF#> and a disassembler,
you can very easilly find where your code is bad (BTW, I use dobj from
netdcc, (c) by Matt Dillon). Please note that in this new version,
<PC#> will nomore point to some instruction before the faultly one. It
is always the real faultly adress.
The remaining lines show the context of the illegal access. It
gives you informations about the surronding memory blocks owned by
your program. Each block is displayed according to the following
pattern:
[$<N1>(<N2>) <ATTR> (<TEXT>)]
where <N1> is the hexadecimal address of the beginning of the block,
<N2> its length (in decimal). Note that the length may seem to be
longer than the one allocated by malloc() and the address may point
before the one you obtained via malloc(). This is not wrong ! In fact
you must know that the malloc() subroutine may add some informations
(like an double-chained list or the length of the allocation) to the
block you've requested. Those extra informations are put before the
address you recieve. That explain this behavior. In this version of
APur-dice.lib, this takes 8 extra bytes. So if you allocate 10 bytes,
don't be suprised if APurify thinks you've requested 18 bytes.
<ATTR> are 3 status characters RWS
where R means: read-enable block
W means: write-enable block
S means: system block (block not controlled by the program).
If one access is forbidden, the letter '-' replaces the corresponding
character. <TEXT> is actually the name of the procedure that has
allocated the block.
With each block you can find an offset. That offset is the distance
between that block and the faultly address. In verbose mode, you can
see some text explaining things about the relative position of a block
and the accessed memory. In non-verbose mode you can just see the
offsets followed by the blocks. The shorter offset is displayed first
since that block is the one that is more likely overstepped.
When an illegal writing occur (the only dangerous thing you can do
by indirection, indeed), APurify tells you to that error is really
dangerous and asks if you wish to stop your program. If you wish so,
exit() is called. You can also ignore that error or ignore all such
errors (but then you'll surely meet the guru !).
APurify checks the memory allocated but not freed by the program.
(in fact, it detects non deallocated-blocks on library-closing time).
It knows about memory location independant of the program
execution. That is to say, the first kilobyte of memory that contains
interrupt vectors of the 680x0 processor, the program segments and the
stack. Accessing to those blocks will not be illegal. They got the S
attribute (for SYSTEM blocks).
It takes into account memory block allocated by malloc() and
AllocMem(), and indirect allocated block (by OpenScreen() for example).
But I did not test the last kind of allocation. Anyway, it should be
ok, because APurify patches AllocMem() & FreeMem() entries. Thus a
program can access to the bitplanes of one of its screen without error.
If the program makes a legal access, but attributes are
incompatible with the access-kind, a protection-error message is
displayed. Actually only the first kilobyte is read/write-protected.
But it may change in the future.
HOW TO USE APURIFY:
------------------
One can see APurify as a pre-assembler. It must be used on assembly
language sourcefile just before the assembler takes place. It scan the
file and change it a bit so that APur.lib can be used.
Normal way to use it for a C program is to:
- compile C sourcefiles and leave assembly language source (.a).
- use APurify on each .a file.
- compile your .a file to get a .o file
- link all .o files together with APur.lib.
For example, using dcc (DICE) on prog.c that gives
CLI> dcc -a prog.c -o prog.a
CLI> APurify -tb prog.a
CLI> dcc -s prog.a -o prog -lAPur
As you can see, APurify needs no change to your C files to be used.
In this realease you need no more to call AP_Init() in the main()
function. The call is automatically inserted when the main-entry label
(specified by -mp) is found. You shoud not use dos.library/Exit() to
abort your program, I think it'll crash if APurify is running. If you
must use Exit() then call AP_Close() just before calling Exit(). The
explantion is simple: since some system functions are patched, if a
program exits without closing the library, those patch will be
corruped, pointing to a code that is nomore in memory and you'll meet
the guru (ie: the computer will crash)... (You've been warned :-).
You can disable/enable printing of messages by making a call to
AP_Report(flag). If flag is true (ie. different from zero) then
printing is enabled, if it is false (ie. equal to zero), no output will
be done. This is usefull for startup-codes. For example, if you are
using the argv[] array in C, APurify will make a lot of false-error
printing. This is because the values pointed by this array is allocated
before the library is opened. You can avoid this by calling
AP_Report(0) before, and AP_Report(1) after, the code that uses argv[].
When debugging an APurify'ed program, you can put a breakpoint on
a function called AP_Err(). That function AP_Err() is called each time
APurify detects an error. With that, you'll have the occasion to look
at your program just before a faultly memory-access occur.
You can switch from a verbose output to a shorter one with
AP_Verbose(flag). IF flag is true then the verbose mode is on. If it is
false then only short messages will be printed. Some people prefer the
later so that is the default. If you perfer the verbose ouput then put
AP_Verbose(1) someware in your code and you'll get some longer
explanations about illegal accesses.
You can specify a logfile where APurify can put its errors. To do
this, set the environment variable "APlog" (file ENV:APlog) to a name
of a logfile. If this variable is set, then APurify will append all its
outputs to the file indicated.
You can use APurify on any language that generates a temporary
assembly language sourcefile (included assembly itself :-) ). You must
notice too, that you can use it on programs for which no source-code is
available (or .o files without .asm files). For that, use a program
that can do reverse engineering on your executable (ie: that
disassembles the executable and produces a .asm file ready to be
assembled). Then, with minor changes (prepend '_' and append ':' to
every interesting labels, put a call to AP_Init in the right place),
you get a file ready to be processed by APurify. If the processed file
has a HYNK_SYMBOL then you are very lucky and you need not work on
labels. You then just have to find the "_main:" and add "jsr _AP_Init"
as the first instruction of the "_main:" subroutine.
Note: you can use ADIS (by Martin Apel) on aminet to do reverse
engineering (it seems to be quite good a tool to do it).
EXAMPLE:
-------
As an example, let's look at the test.dcc program. You'll see how
you can use the APurify report it produces to find what's wrong in the
program. For this, I've included in that document the commented report.
My comments/explanations appear on lines beginning with a "#".
**** APurify started on Sun Jan 07 18:41:55 1996
#
# Well, the report started...
#
**** APURIFY ERROR ! [$002727b8(4) R-- (_main, PC=$00286210 HUNK=$0
OFFSET=$240)] accessed between:
-29 [$002727d8(23) RW- (_main)]
+77253 [$0025f9e8(12) RW- (_main)]
#
# Hum... First hit... it is an error in reading something in the main()
# procedure between two blocks already allocated. The nearest block
# appears in first position, so we can think that the error was done by
# accessing an array allocated in main() with a negative index. We can
# look at the code to find what is wrong with it. Using DOBJ, we found
# at offset $410 in the first hunk the following code:
#
# 00.00000240 24ab ffd8 MOVE.L -40(A3),(A2)
#
# This corresponds to the C code:
#
# a[0]=b[-10]
#
# Hence we've discovered a first error in the code. Note that -29 is
# the distance (in bytes) between the end of the accessed memory and
# the beginning of the array. This is not the difference between the
# beginning address of the two blocks!
#
**** APURIFY ERROR ! [$0025f9f4(4) R-- (_main, PC=$00286238 HUNK=$0
OFFSET=$268)] accessed between:
+1 [$0025f9e8(12) RW- (_main)]
-79345 [$00272fe8(408) RW- (_main)]
#
# Well... here it seems to be an access just after an allocated block.
# the offset +1 is the distance in bytes between the accessed block and
# a allocated block. The situation is like this:
#
# ---------[ 1 ]( 2 )---------->
#
# Where "[ 1 ]" is the allocated block and "( 2 )" the accessed block.
# If we look in the code, we find:
#
# 00.00000268 4aaa 0004 TST.L 0004(A2)
#
# that correponds to the test done by "if(a[1] == 0)". This is an error
# since the array 'a' is just 12-8=4 bytes long. So a[1] points out of
# the array!
#
**** APURIFY ERROR ! [$0025f9f2(4) R-- (_read_shifted, PC=$0028611a
HUNK=$0 OFFSET=$14a)] accessed across the ending boundary of:
-2 [$0025f9e8(12) RW- (_main)]
#
# Hehe another error... Damn ! That test program is a FULL of bug !
# Yes, but that one is an other kind of error. It is an access across a
# boundary. That occur in the read_shifted() code. We need not look in
# the asm file to see the error. Here it is a misaligment error.
# Visually that gives:
#
# ------------[ 1(]2 )----------->
#
# [ 1 ] = allocated ( 2 ) = accessed.
#
**** APURIFY ERROR ! [$0025f9f0(4) R-- (_read_long, PC=$00286136
HUNK=$0 OFFSET=$166)] accessed between:
-79349 [$00272fe8(408) RW- (_main)]
+2487793 [$00000000(1024) --S (Basic 680x0 vectors)]
#
# That error is strange! It is not an access to an array with a
# negative index as one think immediately: We never call read_long() in
# such a way, and the offsets are too big ! Indeed, the accessed memory
# was right some times ago since it lays in the array 'a' (look at the
# second hit). Hence, it must be an access to a free()'d memory. That
# error is then obviously found in the code:
#
# free_arg(a); read_long(a).
# ^^^^^^^^^^^^
#
**** APURIFY ERROR ! [$00000004(4) R-- (_read_page_zero, PC=$00286176
HUNK=$0 OFFSET=$1a6)] accessed on a read-protected block:
+4 [$00000000(1024) --S (Basic 680x0 vectors)]
#
# Here the error is obvious, were are reading the zero-page. If it was
# in writing, that error would be very dangerous.
#
**** APURIFY WARNING ! Closing library without deallocation of the
following block(s):
- [$00272fe8(408) RW- (_main)]
- [$0029e928(12008) RW- (_main)]
- [$002a1810(40008) RW- (_main)]
#
# The program has exit()ed. APurify tells us that we've forget to free
# those blocks. It is a case of memory leak. Those blocks were
# allocated in main(). Those were allocated and lost by
#
# a=malloc(4),malloc(400),malloc(12000),malloc(400000)
#
# since the assignment only affects the first item of ",,,".
#
**** APurify ended on Sun Jan 07 18:41:56 1996
#
# Well... done :-).
#
LEGAL PART:
----------
That program is provided 'AS IS'. I am not responsible for any
dammage it can cause (but I am responsible for the benefits it can give
to you :-). Use that software at you own risks.
That program is FREEWARE. You can use and distribute it as long as
you keep the archive intact (no adulteration of files except for
compression). It can't be sold without my agreement (except a minimal
amount for media support). You must ask me for commercial use of (any
part of) that product. I keep all my rights on that program and its
future releases. I can modify that software without telling it to the
users.
If you wish, you can send me a postcard or anything else you want
(money, documentation, amiga, hardware stuff, ...) in exchange for
using APurify. But there is no obligation :-). My postal address is:
M. DEVULDER Samuel
1, Rue du chateau
59380 STEENE
FRANCE
(yes I'm french !). You can send suggestions or bugs to my email
address:
devulder@info.unicaen.fr
NOTES:
-----
It has been compiled with freedice 2.06.37.
I had the idea of that program after a chat with Cedric BEUST
(AMIGA NEWS) on IRC (Internet Relay Chat). Thanks Cedric !
All marks are proprietary of their respective owners.
There are some programs like APurify. For example, FORTIFY (Simon
P. Bullen), but it only detects illegal writes to boundaries of
allocated blocks. Thus it can't detect big oversteps and oversteps in
reading and the detection is not real-time. Enforcer can detect illegal
access to memory (I think), but it needs a special device (MMU).
HINTS & TIPS:
------------
You can see some memory leaks with that version of APurify. It is
not really good but it can help. Memory leak occur when a block of
memory is nomore pointed by your program. Those memory blocks will
necessary be displayed when your program exit()s. So with all the
messages printed on that occasion, you can find such blocks. I known
this is not so great, but I think it can help you a little bit (maybe
in a future version I'll build some code to really check memory leaks).
BUGS:
----
APurify don't known public memory where a program can read or write
without having allocated it. Thus, it will report an error when a
program reads or writes values in a message obtained through GetMsg()
calls. Use AP_Report() to avoid such reports.
It can display messages about closing the library without freeing
some memory blocks. This is due to printf() that allocates memory that
is free'd on exit. This is not a real bug, but you can avoid this by
doing a AP_Report(0) just before exiting. But you must notice that it
is better to display false bugs than to not display real ones.
Certainly more bugs, but I'm waiting for your bug-reports.